Microsoft
has invested heavily in .NET Framework 4 and Visual Studio 2010 to
allow all developers to safely and easily embrace parallel programming
concepts within applications. The main driver for the Parallel
Programming model is simply that the era of massive doubling and
redoubling of processor speed is apparently over, and therefore, the
era of our code automatically doubling in speed as hardware improves is
also coming an end. This breakdown in Moore’s Law (proposed by an Intel
executive that predicted a doubling of CPU processor speed every two
years), as it is commonly known, was brought to a screeching halt due
to thermal and power constraints within the heart of a PC in the
silicon processor that executes programs. To counter this roadblock and
to allow computing speed to continue to scale, processor manufacturers
and computer hardware designers have simply added more than one
processor into the hardware that runs code.
However,
not all programming techniques, languages, compilers, or developers for
that matter automatically scale to writing multiple core compatible
code. This leaves a lot of unused processing power on the table—CPU
processing power that could improve responsiveness and user experience
of applications.
Microsoft’s MSDN Magazine published an article titled “Paradigm Shift—Design Considerations for Parallel Programming” by David Callahan,
which offers Microsoft’s insight regarding the drivers and approach
Microsoft is taking in response to the need for parallel programming
techniques. It begins by setting the scene:
...today,
performance is improved by the addition of processors. So-called
multicore systems are now ubiquitous. Of course, the multicore approach
improves performance only when software can perform multiple activities
at the same time. Functions that perform perfectly well using
sequential techniques must be written to allow multiple processors to
be used if they are to realize the performance gains promised by the
multiprocessor machines.
And concludes with the call to action:
The
shift to parallelism is an inflection point for the software industry
where new techniques must be adopted. Developers must embrace
parallelism for those portions of their applications that are time
sensitive today or that are expected to run on larger data sets
tomorrow.
The
main take-away regarding parallel programming drivers is that there is
no more free application performance boost just because of a hardware
CPU speed upgrade; for applications to run faster in the future,
programming techniques that support multiple processors (and cores)
need to be the standard approach. The techniques employed must also not
be limited in how many CPU cores the code was originally authored from.
The application needs to detect and automatically embrace the available
cores on the executing hardware (which will likely be orders of
magnitude larger in processing power) whether that be 2 cores or 64
cores. Code must be authored in a way that can scale accordingly
without specifically compiled versions.
History of Processor Speed and Multicore Processors
Looking
back at processor history, there was a 1GHz speed processor in 2000,
which doubled in speed in 2001 to 2Ghz, and topped 3GHz in 2002;
however, it has been a long hiatus from seeing processor speed
increasing at that rate. In 2008 processor clock speeds were only just
approaching 4Ghz. In fact, when clock speed stopped increasing, so did
manufacturer marketing the speed of processors; speed was replaced by
various measures of instructions per second.
Power consumption, heat dissipation, and memory latency are just some
of the plethora of limiting factors halting pure CPU clock-speed
increases. Another technique for improving CPU performance had to be
found in order to keep pace with consumer demand.
The
limit of pure clock-speed scaling wasn’t a surprise to this industry as
a whole, and Intel engineers, who first published an article in the
October 1989 issue of IEEE Spectrum (“Microprocessors Circa 2000”)
predicted the use of multicore processor architecture to improve the
end-user experience when using PCs. Intel delivered on their promise in
2005, as did competing processor companies, and it is almost certain
that any computer bought today has multiple cores built into each
microprocessor chip, and for the lucky few, multiple microprocessor
chips built into the motherboard. Rather than straight improvement in
processor clock speed, there are now more processor cores to do the
work. Intel in their whitepaper “Intel Multi-Core Processor
Architecture Development Backgrounder” clearly defines in an understandable way what “multicore processors” consist of:
Explained
most simply, multi-core processor architecture entails silicon design
engineers placing two or more Intel Pentium processor-based “execution
cores,” or computational engines, within a single processor. This
multi-core processor plugs directly into a single processor socket, but
the operating system perceives each of its execution cores as a
discrete logical processor with all the associated execution resources.
The
idea behind this implementation of the chip’s internal architecture is
in essence a “divide and conquer” strategy. In other words, by divvying
up the computational work performed by the single Pentium
microprocessor core in traditional microprocessors and spreading it
over multiple execution cores, a multi-core processor can perform more
work within a given clock cycle. Thus, it
is designed to deliver a better overall user experience. To enable this
improvement, the software running on the platform must be written such
that it can spread its workload across multiple execution cores. This
functionality is called thread-level parallelism or “threading.”
Applications and operating systems (such as Microsoft Windows XP) that
are written to support it are referred to as “threaded” or
“multi-threaded.”
The
final sentence of this quote is important: “Applications and operating
systems...that are written to support it...” Although the operating
system running code almost certainly supports multi-threading, not all
applications are coded in a fashion that fully exploits that ability.
In fact, the current use of multi-threading in applications is to
improve the perceived performance of an application, rather than actual
performance in most cases—a subtle distinction to be explored shortly.
The
operating system running on your PC (or server) exploits all processor
cores in all physical processors it has available to it—aggregating
these as a total number of available CPUs. An example is that if a
4-processor machine is running 4-core processors, it will show in the
operating system as 16 CPUs. (Multiply the number of physical
processors by the number of cores in each processor.) Sixteen CPUs is
now common in server machines, and the number of CPUs is increasing due
to both increased physical socket and core count growth. Expect 32, 64,
or even 128+ CPU machines to be available at a commercial level now and
a consumer level shortly.